Training Neural Networks Without Gradients: A Scalable ADMM Approach

نویسندگان

  • Gavin Taylor
  • Ryan Burmeister
  • Zheng Xu
  • Bharat Singh
  • Ankit Patel
  • Tom Goldstein
چکیده

With the growing importance of large network models and enormous training datasets, GPUs have become increasingly necessary to train neural networks. This is largely because conventional optimization algorithms rely on stochastic gradient methods that don’t scale well to large numbers of cores in a cluster setting. Furthermore, the convergence of all gradient methods, including batch methods, suffers from common problems like saturation effects, poor conditioning, and saddle points. This paper explores an unconventional training method that uses alternating direction methods and Bregman iteration to train networks without gradient descent steps. The proposed method reduces the network training problem to a sequence of minimization substeps that can each be solved globally in closed form. The proposed method is advantageous because it avoids many of the caveats that make gradient methods slow on highly non-convex problems. The method exhibits strong scaling in the distributed setting, yielding linear speedups even when split over thousands of cores.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Stochastic Alternating Direction Method of Multipliers

Alternating direction method of multipliers (ADMM) has been widely used in many applications due to its promising performance to solve complex regularization problems and large-scale distributed optimization problems. Stochastic ADMM, which visits only one sample or a mini-batch of samples each time, has recently been proved to achieve better performance than batch ADMM. However, most stochasti...

متن کامل

Distributed asynchronous optimization of convolutional neural networks

Recently, deep Convolutional Neural Networks have been shown to outperform Deep Neural Networks for acoustic modelling, producing state-of-the-art accuracy in speech recognition tasks. Convolutional models provide increased model robustness through the usage of pooling invariance and weight sharing across spectrum and time. However, training convolutional models is a very computationally expens...

متن کامل

Supervised Hashing with Deep Neural Networks

In this paper, we propose training very deep neural networks (DNNs) for supervised learning of hash codes. Existing methods in this context train relatively “shallow” networks limited by the issues arising in back propagation (e.g. vanishing gradients) as well as computational efficiency. We propose a novel and efficient training algorithm inspired by alternating direction method of multipliers...

متن کامل

Asynchronous Distributed Neural Network Training using Alternating Direction Method of Multipliers

Since the first appearance of a large-scale dataset [4] and powerful computational resources such as GPUs, Convolutional Neural Networks(CNN) became the essential machine learning algorithm for image classification, detection, and many application. As the popularity of CNN increases, the size of CNN increased as well[10, 13]. (For instance, AlexNet has more than 60 milion parameters.) The perfo...

متن کامل

Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks

Large multilayer neural networks trained with backpropagation have recently achieved state-ofthe-art results in a wide range of problems. However, using backprop for neural net learning still has some disadvantages, e.g., having to tune a large number of hyperparameters to the data, lack of calibrated probabilistic predictions, and a tendency to overfit the training data. In principle, the Baye...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016